The results are based on answers to the main life evaluation questions asked in the poll. This dataset comes from Kaggle website.The happiness score based on cantril ladder. These questions ask respondents to think of a ladder with the best possible life for them being a 10, and worst possible life being a 0. The respondents rate their own current lives based on that scale, the data is gathered by countries and the region is specified too. The columns following the happiness score estimate the extent to which each of six factors economic production, social support, life expectancy, freedom, absence of corruption, and generosity, influences happiness among the respondents.
The dataset imported as a CSV file- a convenient format to work with.The attached libraries used us for analyis the dataset by our chosen models.
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(rworldmap)))
suppressWarnings(suppressMessages(library(ggmap)))
suppressWarnings(suppressMessages(library(ggcorrplot)))
suppressWarnings(suppressMessages(library(ggpubr)))
suppressWarnings(suppressMessages(library(plotly)))
suppressWarnings(suppressMessages(library(gapminder)))
suppressWarnings(suppressMessages(library(forcats)))
suppressWarnings(suppressMessages(library(magrittr)))
World_happiness <- read.csv("world-happiness-report-2021 (1).csv")
library(reactable)
reactable(World_happiness)
str(World_happiness)
## 'data.frame': 149 obs. of 20 variables:
## $ ï..Country.name : chr "Finland" "Denmark" "Switzerland" "Iceland" ...
## $ Regional.indicator : chr "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
## $ Ladder.score : num 7.84 7.62 7.57 7.55 7.46 ...
## $ Standard.error.of.ladder.score : num 0.032 0.035 0.036 0.059 0.027 0.035 0.036 0.037 0.04 0.036 ...
## $ upperwhisker : num 7.9 7.69 7.64 7.67 7.52 ...
## $ lowerwhisker : num 7.78 7.55 7.5 7.44 7.41 ...
## $ Logged.GDP.per.capita : num 10.8 10.9 11.1 10.9 10.9 ...
## $ Social.support : num 0.954 0.954 0.942 0.983 0.942 0.954 0.934 0.908 0.948 0.934 ...
## $ Healthy.life.expectancy : num 72 72.7 74.4 73 72.4 73.3 72.7 72.6 73.4 73.3 ...
## $ Freedom.to.make.life.choices : num 0.949 0.946 0.919 0.955 0.913 0.96 0.945 0.907 0.929 0.908 ...
## $ Generosity : num -0.098 0.03 0.025 0.16 0.175 0.093 0.086 -0.034 0.134 0.042 ...
## $ Perceptions.of.corruption : num 0.186 0.179 0.292 0.673 0.338 0.27 0.237 0.386 0.242 0.481 ...
## $ Ladder.score.in.Dystopia : num 2.43 2.43 2.43 2.43 2.43 2.43 2.43 2.43 2.43 2.43 ...
## $ Explained.by..Log.GDP.per.capita : num 1.45 1.5 1.57 1.48 1.5 ...
## $ Explained.by..Social.support : num 1.11 1.11 1.08 1.17 1.08 ...
## $ Explained.by..Healthy.life.expectancy : num 0.741 0.763 0.816 0.772 0.753 0.782 0.763 0.76 0.785 0.782 ...
## $ Explained.by..Freedom.to.make.life.choices: num 0.691 0.686 0.653 0.698 0.647 0.703 0.685 0.639 0.665 0.64 ...
## $ Explained.by..Generosity : num 0.124 0.208 0.204 0.293 0.302 0.249 0.244 0.166 0.276 0.215 ...
## $ Explained.by..Perceptions.of.corruption : num 0.481 0.485 0.413 0.17 0.384 0.427 0.448 0.353 0.445 0.292 ...
## $ Dystopia...residual : num 3.25 2.87 2.84 2.97 2.8 ...
dim(World_happiness)
## [1] 149 20
summary(World_happiness)
## ï..Country.name Regional.indicator Ladder.score
## Length:149 Length:149 Min. :2.523
## Class :character Class :character 1st Qu.:4.852
## Mode :character Mode :character Median :5.534
## Mean :5.533
## 3rd Qu.:6.255
## Max. :7.842
## Standard.error.of.ladder.score upperwhisker lowerwhisker
## Min. :0.02600 Min. :2.596 Min. :2.449
## 1st Qu.:0.04300 1st Qu.:4.991 1st Qu.:4.706
## Median :0.05400 Median :5.625 Median :5.413
## Mean :0.05875 Mean :5.648 Mean :5.418
## 3rd Qu.:0.07000 3rd Qu.:6.344 3rd Qu.:6.128
## Max. :0.17300 Max. :7.904 Max. :7.780
## Logged.GDP.per.capita Social.support Healthy.life.expectancy
## Min. : 6.635 Min. :0.4630 Min. :48.48
## 1st Qu.: 8.541 1st Qu.:0.7500 1st Qu.:59.80
## Median : 9.569 Median :0.8320 Median :66.60
## Mean : 9.432 Mean :0.8147 Mean :64.99
## 3rd Qu.:10.421 3rd Qu.:0.9050 3rd Qu.:69.60
## Max. :11.647 Max. :0.9830 Max. :76.95
## Freedom.to.make.life.choices Generosity Perceptions.of.corruption
## Min. :0.3820 Min. :-0.28800 Min. :0.0820
## 1st Qu.:0.7180 1st Qu.:-0.12600 1st Qu.:0.6670
## Median :0.8040 Median :-0.03600 Median :0.7810
## Mean :0.7916 Mean :-0.01513 Mean :0.7274
## 3rd Qu.:0.8770 3rd Qu.: 0.07900 3rd Qu.:0.8450
## Max. :0.9700 Max. : 0.54200 Max. :0.9390
## Ladder.score.in.Dystopia Explained.by..Log.GDP.per.capita
## Min. :2.43 Min. :0.0000
## 1st Qu.:2.43 1st Qu.:0.6660
## Median :2.43 Median :1.0250
## Mean :2.43 Mean :0.9772
## 3rd Qu.:2.43 3rd Qu.:1.3230
## Max. :2.43 Max. :1.7510
## Explained.by..Social.support Explained.by..Healthy.life.expectancy
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.6470 1st Qu.:0.3570
## Median :0.8320 Median :0.5710
## Mean :0.7933 Mean :0.5202
## 3rd Qu.:0.9960 3rd Qu.:0.6650
## Max. :1.1720 Max. :0.8970
## Explained.by..Freedom.to.make.life.choices Explained.by..Generosity
## Min. :0.0000 Min. :0.000
## 1st Qu.:0.4090 1st Qu.:0.105
## Median :0.5140 Median :0.164
## Mean :0.4987 Mean :0.178
## 3rd Qu.:0.6030 3rd Qu.:0.239
## Max. :0.7160 Max. :0.541
## Explained.by..Perceptions.of.corruption Dystopia...residual
## Min. :0.0000 Min. :0.648
## 1st Qu.:0.0600 1st Qu.:2.138
## Median :0.1010 Median :2.509
## Mean :0.1351 Mean :2.430
## 3rd Qu.:0.1740 3rd Qu.:2.794
## Max. :0.5470 Max. :3.482
Densities plot of each of the factor that represents an effective way to understand the distribution of each independence variable
density_social_support <- density(World_happiness$Social.support)
density_Healthy.life.expectancy <- density(World_happiness$Healthy.life.expectancy)
density_Freedom.to.make.life.choices <- density(World_happiness$Freedom.to.make.life.choices)
density_Generosity <- density(World_happiness$Generosity)
density_Perceptions.of.corruption <- density(World_happiness$Perceptions.of.corruption)
suppressWarnings(suppressMessages(attach(mtcars)))
par(mfrow=c(3,2))
suppressWarnings(suppressMessages(plot(density_social_support, main="Density of Social support", col="purple")) +polygon(density_social_support, col="orange"))
## integer(0)
suppressWarnings(suppressMessages(plot(density_Healthy.life.expectancy, main="Density of Freedom.to.make.life.choices")) +polygon(density_Healthy.life.expectancy, col="green"))
## integer(0)
suppressWarnings(suppressMessages(plot(density_Freedom.to.make.life.choices, main="Density of Freedom.to.make.life.choices")) +polygon(density_Freedom.to.make.life.choices, col="yellow"))
## integer(0)
suppressWarnings(suppressMessages(plot(density_Generosity, main="Density of Freedom.to.make.life.choices")) +polygon(density_Generosity, col="purple"))
## integer(0)
suppressWarnings(suppressMessages(plot(density_Perceptions.of.corruption, main="Density of Perceptions.of.corruption")) +polygon(density_Perceptions.of.corruption, col="blue"))
## integer(0)
df<-data.frame(x=World_happiness$ï..Country.name, y=World_happiness$Ladder.score)
ladder_score<-World_happiness$Ladder.score
countries<-World_happiness$ï..Country.name
plot_geo(df,locationmode ='country names') %>%
add_trace(locations=~countries,z=~ladder_score,color=~ladder_score)%>%
layout(title="COUNTIERS HAPPINESS SCORE MAP")
We can observe from our map that the locations differ from one another by having different happiness scores. There are some counties where no data is available so this region is colored white. We can observe that some regions, such as America, parts of Europe, and Australia, seem to have on a higher level of happiness. Asia, on the other hand, is in the middle rank, while the majority of Africa is in the lower happiness ranking.
t test will examine the difference in means. The test frequently used in hypothesis testing to see whether an variable has an effect on a population of interest, or if two groups vary from one another.
Our goal is to determine if logged GDP per capita, which stands for GDP = Gross Domestic product, per person is effecting there ladder score of happiness. We use the group by to group the areas of the countries and count the mean of their GDP per capita. Looking at the means, we can’t be sure if that is a reliable difference.So ,to make a simple example, does the fact that my GDP per capita is higher will make me more likely to be happier then you? now or in the future..
t.test(World_happiness$Ladder.score,World_happiness$Logged.GDP.per.capita, mu=0,alt="greater",conf=0.95,var.eq=F,paried=FALSE)
##
## Welch Two Sample t-test
##
## data: World_happiness$Ladder.score and World_happiness$Logged.GDP.per.capita
## t = -30.13, df = 294.31, p-value = 1
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -4.112918 Inf
## sample estimates:
## mean of x mean of y
## 5.532839 9.432208
region<-group_by(World_happiness,Regional.indicator)
s<-summarise(region, meanGDP=mean(Logged.GDP.per.capita,NA.RN = TRUE),meanScore=mean(Ladder.score,NA.RN=TRUE))
show(s)
## # A tibble: 10 x 3
## Regional.indicator meanGDP meanScore
## <chr> <dbl> <dbl>
## 1 Central and Eastern Europe 10.1 5.98
## 2 Commonwealth of Independent States 9.40 5.47
## 3 East Asia 10.4 5.81
## 4 Latin America and Caribbean 9.37 5.91
## 5 Middle East and North Africa 9.67 5.22
## 6 North America and ANZ 10.8 7.13
## 7 South Asia 8.68 4.44
## 8 Southeast Asia 9.42 5.41
## 9 Sub-Saharan Africa 8.08 4.49
## 10 Western Europe 10.8 6.91
p<- ggplot(region,aes(y=Logged.GDP.per.capita))+geom_boxplot()+labs(title = "GDP PER CAPITA BOXPLOT")
ggplotly(p)
abovemidian<-World_happiness%>%
summarise(ï..Country.name,Logged.GDP.per.capita>median(Logged.GDP.per.capita))
The boxplot shows the median GDP and the shape of the distribution.
plot the count of regions and there mean GDP the fill is the mean happiness in those areas
pointPlot<-ggplot(data =s,aes(x=meanGDP,y= meanScore))+
geom_point(aes(color= Regional.indicator,size=meanScore))+
geom_smooth()+labs(title="HAPPINESS SCORE PER REGION
SHOWED BY MEAN GDP")
ggplotly(pointPlot)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
From our plot we can see, that the alternative hypothesis is true The plot shows that as the mean GDP is higher the mean ladder score is higher too. We used geom_point to present the points of gdp and score of each region. Also, the size of each point represents the region mean happiness score. We added plotly for comfortable navigation of the data, to show the region name and values.
MS<-ggplot(s, aes(x=meanScore)) + geom_density() + geom_vline(data=s,aes(xintercept=mean(meanScore),colour="SCORE"),linetype="dashed", size=1)+labs(title ="SCORE MEAN DENSITY")
MG<-ggplot(s, aes(x=meanGDP)) +geom_density()+geom_vline(data=s, aes(xintercept=mean(meanGDP),colour="GDP"),linetype="dashed", size=1)+labs(title ="GDP MEAN DENSITY")
ggplotly(MS)
ggplotly(MG)
Our numeric axis is the mean ladder score by region and the mean of all regions.And the same for GDP. We chose to use geom_density because the peaks represent the highest concentration of points. We added ggplotly so its easier to show the data that presented.
var(s$meanScore)
## [1] 0.7838723
var(s$meanGDP)
## [1] 0.783198
We choose to demonstrate the relationship between the countries happiness scores and six factors that were the Categories when the score was built.
The Multiple Regression used to predict an outcome variable y on the basis of multiple predictor variables.
select_ciritions <- c("Ladder.score"
,"Logged.GDP.per.capita",
"Social.support",
"Healthy.life.expectancy",
"Freedom.to.make.life.choices",
"Generosity",
"Perceptions.of.corruption")
Criteria_for_happiness <- World_happiness %>%
select(select_ciritions)
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(select_ciritions)` instead of `select_ciritions` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
corr <- round(cor(Criteria_for_happiness[2:7]),1)
happiness_correlation_matrix <- head(corr[,1:5])
ggcorrplot(corr) %>%
ggplotly()
As we can observe, there are some variables with a strong correlation and some with low. We use corrplot to highlight the most correlated.
pairs(data=Criteria_for_happiness, ~Logged.GDP.per.capita+Social.support+
Healthy.life.expectancy+Freedom.to.make.life.choices+Perceptions.of.corruption+Generosity,main='independent variables Scatterplots', ,col=c('red','blue'))
There are linear patterns between some of the pair of variable, like the relationship between “GDP per capita” and “Healthy life expectancy” that has strong relationship.As approved in out project, despite not all the assumptions are accepted we will examine the Multiple regression model.
Multiple_Regression_model <- lm(Ladder.score ~ Logged.GDP.per.capita+ Social.support + Healthy.life.expectancy + Freedom.to.make.life.choices + Generosity + Perceptions.of.corruption, data = World_happiness)
summary(Multiple_Regression_model)
##
## Call:
## lm(formula = Ladder.score ~ Logged.GDP.per.capita + Social.support +
## Healthy.life.expectancy + Freedom.to.make.life.choices +
## Generosity + Perceptions.of.corruption, data = World_happiness)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.85049 -0.30026 0.05735 0.33368 1.04878
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.23722 0.63049 -3.548 0.000526 ***
## Logged.GDP.per.capita 0.27953 0.08684 3.219 0.001595 **
## Social.support 2.47621 0.66822 3.706 0.000301 ***
## Healthy.life.expectancy 0.03031 0.01333 2.274 0.024494 *
## Freedom.to.make.life.choices 2.01046 0.49480 4.063 7.98e-05 ***
## Generosity 0.36438 0.32121 1.134 0.258541
## Perceptions.of.corruption -0.60509 0.29051 -2.083 0.039058 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5417 on 142 degrees of freedom
## Multiple R-squared: 0.7558, Adjusted R-squared: 0.7455
## F-statistic: 73.27 on 6 and 142 DF, p-value: < 2.2e-16
The Estimate column in the coefficients table estimate b0 and b1 for each independent variable.
In order to analysis the multiple regression , we examined the F-statistic and the associated p-value.
As we can see, the p-value of the F-statistic is smaller than 2.2e-16 - it means that the model has highly significant.By examine all the predictors variables we are using coefficients to represent the estimate of the regression.
coefficients(Multiple_Regression_model) # model coefficients
## (Intercept) Logged.GDP.per.capita
## -2.23721929 0.27953290
## Social.support Healthy.life.expectancy
## 2.47620585 0.03031381
## Freedom.to.make.life.choices Generosity
## 2.01046470 0.36438194
## Perceptions.of.corruption
## -0.60509177
anova(Multiple_Regression_model) #Anova table
## Analysis of Variance Table
##
## Response: Ladder.score
## Df Sum Sq Mean Sq F value Pr(>F)
## Logged.GDP.per.capita 1 106.463 106.463 362.7575 < 2.2e-16 ***
## Social.support 1 8.320 8.320 28.3503 3.869e-07 ***
## Healthy.life.expectancy 1 3.476 3.476 11.8455 0.0007596 ***
## Freedom.to.make.life.choices 1 8.769 8.769 29.8807 2.009e-07 ***
## Generosity 1 0.713 0.713 2.4306 0.1212116
## Perceptions.of.corruption 1 1.273 1.273 4.3383 0.0390577 *
## Residuals 142 41.674 0.293
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(Multiple_Regression_model)
The quality of the model can be assessed by examining the R-square and the RSE(Residual standard error
In our model the R-square represents the correlation coefficient between the observed values of the total score (y) and fitted the predicted value of y
<we would like to calculate RES estimate that measure the error of prediction of the score for all countries.sigma(Multiple_Regression_model)/mean(World_happiness$Ladder.score)
## [1] 0.09791358
Our RSE is lower than 0.1, the lower of the RES the more accurate the mode.
Using the partial F test we determine the difference between multiple regression mode and nested model that contains all the factors except Freedom to make life choices factor.As you can see below this is the model without the freedom factor.
partial_Multiple_Regression_model <- lm(Ladder.score ~ Logged.GDP.per.capita+ Social.support + Healthy.life.expectancy + Generosity + Perceptions.of.corruption, data = World_happiness)
summary(partial_Multiple_Regression_model)
##
## Call:
## lm(formula = Ladder.score ~ Logged.GDP.per.capita + Social.support +
## Healthy.life.expectancy + Generosity + Perceptions.of.corruption,
## data = World_happiness)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.62939 -0.32554 0.04462 0.36965 1.24574
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.30347 0.61814 -2.109 0.03671 *
## Logged.GDP.per.capita 0.26374 0.09134 2.888 0.00449 **
## Social.support 3.21205 0.67720 4.743 5.04e-06 ***
## Healthy.life.expectancy 0.03711 0.01393 2.665 0.00859 **
## Generosity 0.64851 0.33007 1.965 0.05138 .
## Perceptions.of.corruption -0.92182 0.29464 -3.129 0.00213 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5704 on 143 degrees of freedom
## Multiple R-squared: 0.7275, Adjusted R-squared: 0.7179
## F-statistic: 76.34 on 5 and 143 DF, p-value: < 2.2e-16
To examine the F-test , we calculated the F test-static
Full_model_rrs <- sum((fitted(Multiple_Regression_model)- mean(World_happiness$Ladder.score))^2)
partial_Multiple_Regression_model
##
## Call:
## lm(formula = Ladder.score ~ Logged.GDP.per.capita + Social.support +
## Healthy.life.expectancy + Generosity + Perceptions.of.corruption,
## data = World_happiness)
##
## Coefficients:
## (Intercept) Logged.GDP.per.capita
## -1.30347 0.26374
## Social.support Healthy.life.expectancy
## 3.21205 0.03711
## Generosity Perceptions.of.corruption
## 0.64851 -0.92182
partial_model_rrs <- sum((fitted(partial_Multiple_Regression_model)- mean(World_happiness$Ladder.score))^2)
Full_model_rrs
## [1] 129.0157
partial_model_rrs
## [1] 124.1705
Now we used the F test by the formula with ANOVA table, we use the Anova to show the estimation of how variables chances according to the level of one of more independent variables.
anova(Multiple_Regression_model, partial_Multiple_Regression_model)
## Analysis of Variance Table
##
## Model 1: Ladder.score ~ Logged.GDP.per.capita + Social.support + Healthy.life.expectancy +
## Freedom.to.make.life.choices + Generosity + Perceptions.of.corruption
## Model 2: Ladder.score ~ Logged.GDP.per.capita + Social.support + Healthy.life.expectancy +
## Generosity + Perceptions.of.corruption
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 142 41.674
## 2 143 46.520 -1 -4.8452 16.509 7.978e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the Null hypothesis of the F-model , Due to the results we have evidence to believe that the full model is significant better because the pr > F is 7.978e.
The analysis showed that happiness, as subjective as it is, when translated into quantitative variables,it showed to be complex and dependent on a variety of factors.
After we analyzed the factors that affected our ladder score, we identified that there is liner fit between them. These factors revealed that the data has linear correlation between factors , and our test hypothesis that higher GDP per Capita effects our happiness is true. Although not all the assumptions of Multiple regression were accepted, we observed the correlations of all the happiness factors from 149 countries on the happiness Score.we found out through Partial F-test that the Full model is significant better that the Partial Model (without Freedom to make life choices)
As for our perspective, we see how this data captures the individuals feeling about their happiness as quality of life on the globe. However we do acknowledge that happiness is subjective, so you may live in an area where the score is lower, but you might fell happier and for you the half glass is full.